Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

✓AI News
AI Tools

Type :

✓AI News
AI Tools

2025-03-27 08:21:37.AIbase

Alibaba Releases Qwen2.5-Omni, a New Generation of End-to-End Multimodal Model

The Alibaba Cloud Tongyi Qianwen Qwen team announced the launch of Qwen2.5-Omni, a new generation of end-to-end multimodal flagship model in the Qwen family. Designed for comprehensive multimodal understanding, this new model seamlessly handles various input formats including text, images, audio, and video, and generates text and natural speech synthesis outputs simultaneously via real-time streaming response.

2025-03-25 10:03:35.AIbase

Alibaba Unveils Qwen2.5-VL-32B: A New Multimodal Model Combining Vision, Language, and Mathematical Reasoning

Alibaba is making waves in the AI field with the recent open-source release of its latest multimodal model, Qwen2.5-VL-32B-Instruct. This model is part of the Qwen2.5 series, which also includes 3B, 7B, and 72B versions. The 32B version prioritizes convenient local execution while maintaining performance. Enhanced through reinforcement learning, Qwen2.5-VL-32B excels in several areas. Notably, its responses are more aligned with human expectations.

2025-03-13 08:52:11.AIbase

Google Open-Sources Next-Generation Multimodal Model Gemma-3: Superior Performance, 10x Lower Cost

Google CEO Sundar Pichai announced at a launch event that Google has open-sourced its latest multimodal large model, Gemma-3. This model is attracting significant attention for its low cost and high performance. Gemma-3 offers four different parameter scale options: 1 billion, 4 billion, 12 billion, and 27 billion parameters. Surprisingly, the largest 27 billion parameter model only requires a single H100 GPU for efficient inference, while similar models often require ten times the computing power.

2025-03-12 10:16:39.AIbase

Alibaba Tongyi Team Open-Sources R1-Omni: A Multimodal Model for Transparent Audio-Visual Information

2025-03-12 08:21:57.AIbase

Alibaba's Tongyi Open-Sources R1-Omni Model for Enhanced Multimodal Emotion Recognition

On March 11th, the Tongyi Lab team announced the open-sourcing of the R1-Omni model, marking a significant advancement in multimodal model development. This model integrates reinforcement learning with verifiable reward (RLVR) methods, focusing on improving reasoning capabilities and generalization performance in multimodal emotion recognition tasks. R1-Omni's training is divided into two stages. In the cold-start phase, the team fine-tuned the model using a combined dataset containing 580 video clips sourced from Explainable Multimodal Emotion...

2025-03-10 16:04:04.AIbase

Huawei Ascend and Step-Star Launch Open-Source Multimodal Model, Entering New AI Territory

Recently, the Modelers community officially launched Step-Video and Step-Audio, two open-source multimodal large models developed by Step-Star. These models are designed for video generation and voice interaction, respectively, aiming to provide developers and enterprise users with more powerful AI tools. Step-Video, formally known as Step-Video-T2V, is a 30-billion parameter model, making it the world's largest open-source video generation model. This model can directly generate 20...

2025-03-04 09:41:14.AIbase

Huazhong University of Science and Technology and ByteDance Launch Liquid: Redefining Multimodal Model Generation and Understanding

2025-02-21 15:58:33.AIbase

Aliyun Modao Launches Two Latest Open Source Multimodal Models - Jump Star

2025-01-28 10:34:39.AIbase

DeepSeek unleashes a new surprise in the late night with the launch of the new multimodal model Janus-Pro

2024-12-18 17:52:23.AIbase

New Breakthrough in Multimodal Models: Fei-Fei Li's Team Unifies Actions and Language, Not Only Understanding Commands but also Reading Implicit Emotions

2024-12-10 08:03:30.AIbase

Zhipu AI Launches Free Multimodal Model GLM-4V-Flash: Enhancing Image Processing Accuracy

Beijing Zhipu Huazhang Technology Co., Ltd. announced that its Zhipu Open Platform BigModel has launched the first free multimodal API—GLM-4V-Flash. This new model leverages the excellent capabilities of the 4V series, achieving improved accuracy in image processing and further lowering the barriers for developers to delve deeper into large models across various fields.

2024-11-30 10:01:37.AIbase

Zhipu AI Open Source End-Side Large Language and Multimodal Model GLM-Edge Series

Zhipu Technology recently announced the open source of its end-side large language and multimodal model GLM-Edge series, marking an important attempt by the company in real-world use cases at the end side. The GLM-Edge series consists of four different model sizes, including GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, and GLM-Edge-V-5B, which are optimized for mobile platforms such as smartphones and vehicle systems, as well as desktop platforms like PCs.

2024-11-19 13:51:41.AIbase

Peking University Team Releases Multimodal Model LLaVA-o1, Inference Capabilities Comparable to GPT-o1!

Recently, research teams from Peking University announced the release of an open-source multimodal model called LLaVA-o1, which is claimed to be the first visual language model capable of spontaneous and systematic reasoning, comparable to GPT-o1. The model excels in six challenging multimodal benchmark tests, with its 11B parameter version outperforming competitors such as Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct.

2024-11-19 09:54:07.AIbase

Mistral Launches the Most Powerful Open Source Multimodal Model Pixtral Large, Upgrading Le Chat to Directly Call Flux Pro

2024-10-25 11:16:59.AIbase

Salesforce AI Research Unveils New Multimodal Model BLIP-3-Video: Cost-Effective Video Understanding

2024-09-27 17:37:02.AIbase

Super Powerful Multimodal Model Emu3: Understanding Images and Videos Through Next Word Prediction

2024-09-26 14:34:11.AIbase

The Open Source Multimodal Model Molmo Can Recognize Objects in Images and Generate Accurate Descriptions

Recently, an open source multimodal AI model named Molmo has drawn widespread attention in the industry. This AI system, based on Qwen2-72B and leveraging OpenAI's CLIP as the visual processing engine, is challenging the dominance of traditional commercial models with its outstanding performance and innovative features. Molmo's standout characteristic is its efficient performance. Despite its relatively small size, it can compete with competitors that are ten times larger in processing capability. This 'small but exquisite' design philosophy not only enhances the model's

2024-08-13 08:15:52.AIbase

Starred Over Ten Thousand! The MiniCPM-V2.6 Model of WallFacer Intelligence Tops GitHub

The latest version 2.6 of WallFacer’s MiniCPM-V series has rapidly climbed to the Top 3 on GitHub and HuggingFace trends, surpassing ten thousand stars. Since its release in February, it has accumulated over a million downloads, becoming a benchmark for on-device model capabilities. MiniCPM-V2.6 achieves performance enhancements for on-device multimodal models with 8 billion parameters, including real-time video understanding, multi-image joint understanding, and multi-image in-context learning, with a quantized backend memory of only 6GB and an inference speed of up to 18 tokens.

2024-08-02 09:04:21.AIbase

Google Launches Powerful Multimodal Model Gemini 1.5 Pro, Outranking GPT-4o and Claude-3.5 Sonnet

Google has released its latest AI masterpiece, Gemini 1.5 Pro, offering an experimental version 0801 through Google AI Studio and the Gemini API. This model leads the LMSYS leaderboard with an ELO score of 1300, surpassing OpenAI's GPT-4o and Anthropic's Claude-3.5 Sonnet. Gemini 1.5 Pro excels in multilingual tasks, mathematics, coding, and visual tasks, featuring a context window of 2 million tokens.

2024-07-31 17:56:44.AIbase

Shusheng · Puyu Lingbi Multimodal Model Upgrade Version 2.5 Supports Longer Contexts and Image-Video Understanding Comparable to GPT-4V

Shusheng · Puyu Lingbi (InternLM-XComposer) Version 2.5 was developed by the Shanghai Artificial Intelligence Laboratory, focusing on long context input and output capabilities, operating smoothly within a length of 96K, and trained with 24K interleaved image-text data. Key upgrades include: high-resolution image understanding, fine-grained video understanding, and multi-turn multi-image dialogue. In application, it can create web pages and write high-quality text-image articles. Evaluations show it surpasses state-of-the-art open-source models across 16 benchmark tests and performs at par with key tasks compared to GPT-4V and Gem.

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

Alibaba Releases Qwen2.5-Omni, a New Generation of End-to-End Multimodal Model

Alibaba Unveils Qwen2.5-VL-32B: A New Multimodal Model Combining Vision, Language, and Mathematical Reasoning

Google Open-Sources Next-Generation Multimodal Model Gemma-3: Superior Performance, 10x Lower Cost

Alibaba Tongyi Team Open-Sources R1-Omni: A Multimodal Model for Transparent Audio-Visual Information

Alibaba's Tongyi Open-Sources R1-Omni Model for Enhanced Multimodal Emotion Recognition

Huawei Ascend and Step-Star Launch Open-Source Multimodal Model, Entering New AI Territory

Huazhong University of Science and Technology and ByteDance Launch Liquid: Redefining Multimodal Model Generation and Understanding

Aliyun Modao Launches Two Latest Open Source Multimodal Models - Jump Star

DeepSeek unleashes a new surprise in the late night with the launch of the new multimodal model Janus-Pro

New Breakthrough in Multimodal Models: Fei-Fei Li's Team Unifies Actions and Language, Not Only Understanding Commands but also Reading Implicit Emotions

Zhipu AI Launches Free Multimodal Model GLM-4V-Flash: Enhancing Image Processing Accuracy

Zhipu AI Open Source End-Side Large Language and Multimodal Model GLM-Edge Series

Peking University Team Releases Multimodal Model LLaVA-o1, Inference Capabilities Comparable to GPT-o1!

Mistral Launches the Most Powerful Open Source Multimodal Model Pixtral Large, Upgrading Le Chat to Directly Call Flux Pro

Salesforce AI Research Unveils New Multimodal Model BLIP-3-Video: Cost-Effective Video Understanding

Super Powerful Multimodal Model Emu3: Understanding Images and Videos Through Next Word Prediction

The Open Source Multimodal Model Molmo Can Recognize Objects in Images and Generate Accurate Descriptions

Starred Over Ten Thousand! The MiniCPM-V2.6 Model of WallFacer Intelligence Tops GitHub

Google Launches Powerful Multimodal Model Gemini 1.5 Pro, Outranking GPT-4o and Claude-3.5 Sonnet

Shusheng · Puyu Lingbi Multimodal Model Upgrade Version 2.5 Supports Longer Contexts and Image-Video Understanding Comparable to GPT-4V